A hybrid generative/discriminative approach to text classification with additional information
نویسندگان
چکیده
This paper presents a classifier for text data samples consisting of main text and additional components, such as Web pages and technical papers. We focus on multiclass and single-labeled text classification problems and design the classifier based on a hybrid composed of probabilistic generative and discriminative approaches. Our formulation considers individual component generative models and constructs the classifier by combining these trained models based on the maximum entropy principle. We use naive Bayes models as the component generative models for the main text and additional components such as titles, links, and authors, so that we can apply our formulation to document and Web page classification problems. Our experimental results for four test collections confirmed that our hybrid approach effectively combined main text and additional components and thus improved classification performance. 2006 Published by Elsevier Ltd.
منابع مشابه
A Hybrid Generative/Discriminative Approach to Semi-Supervised Classifier Design
Semi-supervised classifier design that simultaneously utilizes both labeled and unlabeled samples is a major research issue in machine learning. Existing semisupervised learning methods belong to either generative or discriminative approaches. This paper focuses on probabilistic semi-supervised classifier design and presents a hybrid approach to take advantage of the generative and discriminati...
متن کاملHybrid Discriminative-Generative Approach with Gaussian Processes
Machine learning practitioners are often faced with a choice between a discriminative and a generative approach to modelling. Here, we present a model based on a hybrid approach that breaks down some of the barriers between the discriminative and generative points of view, allowing continuous dimensionality reduction of hybrid discretecontinuous data, discriminative classification with missing ...
متن کاملExploiting Unlabelled Data for Hybrid Object Classification
We propose a semi-supervised learning algorithm for visual object categorization which utilizes statistical information from unlabelled data to increase classification performance. We build on an earlier hybrid generative-discriminative approach by Holub et al. [6] which extracts Fisher scores from generative models. The hybrid model allows us to combine the modelling power and flexibility of g...
متن کاملTerms-based discriminative information space for robust text classification
With the popularity of Web 2.0, there has been a phenomenal increase in the utility of text classification in applications like document filtering and sentiment categorization. Many of these applications demand that the classification method be efficient and robust, yet produce accurate categorizations by using the terms in the documents only. In this paper, we propose a novel and efficient met...
متن کاملSemi-Supervised Learning for Multi-Component Data Classification
This paper presents a method for designing a semisupervised classifier for multi-component data such as web pages consisting of text and link information. The proposed method is based on a hybrid of generative and discriminative approaches to take advantage of both approaches. With our hybrid approach, for each component, we consider an individual generative model trained on labeled samples and...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Inf. Process. Manage.
دوره 43 شماره
صفحات -
تاریخ انتشار 2007